{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9e773fec",
   "metadata": {},
   "source": [
    "## numpy相关的几个常数\n",
    "\n",
    "numpy.Inf:浮点表示（正）无穷大。\n",
    "numpy.NAN:浮点表示非数字（NaN）。NaN 和 NAN 是 nan 的等价定义。\n",
    "上面两个常量都为浮点数。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1805f3a8",
   "metadata": {},
   "source": [
    "## 一、numpy.NaN的特性"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "2ca749bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "998f1927",
   "metadata": {},
   "source": [
    "* NaN在numpy的定义中是浮点数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "77f71967",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[3, 6, 2, 7, 3],\n",
       "       [7, 7, 2, 3, 8],\n",
       "       [7, 7, 4, 3, 2]])"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = np.random.randint(0,9,size=(3,5))\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "87312a0d",
   "metadata": {},
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "cannot convert float NaN to integer",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m/tmp/ipykernel_14956/1485986404.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m#此处由于整个数组为整数(int)，因此np.NAN无法写入数组\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mNAN\u001b[0m  \u001b[0;31m#nan\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m: cannot convert float NaN to integer"
     ]
    }
   ],
   "source": [
    "#此处由于整个数组为整数(int)，因此np.NAN无法写入数组\n",
    "data[0,1]=np.NAN  #nan,NaN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "ac60e4e0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 3. nan  2.  7.  3.]\n",
      " [ 7.  7.  2.  3.  8.]\n",
      " [ 7.  7.  4.  3.  2.]]\n"
     ]
    }
   ],
   "source": [
    "data = data.astype(np.float16)\n",
    "data[0,1]=np.NaN\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f454dbd0",
   "metadata": {},
   "source": [
    "* NaN和NaN是不相等的"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "4944f33c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.NaN==np.NaN"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1229ad83",
   "metadata": {},
   "source": [
    "* NaN和任何值运算都等于NaN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "4c3204c7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "100*np.NaN"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5fc9581f",
   "metadata": {},
   "source": [
    "## 二、如何去除数据中的NaN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "b8f0dc5d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[3., 1., 8., 4., 3.],\n",
       "       [6., 2., 5., 4., 5.],\n",
       "       [2., 0., 8., 7., 0.]], dtype=float16)"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = np.random.randint(0,9,size=(3,5))\n",
    "data=data.astype(np.float16)\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "fcd2153c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 3., nan,  8.,  4.,  3.],\n",
       "       [ 6.,  2.,  5.,  4.,  5.],\n",
       "       [ 2.,  0.,  8., nan,  0.]], dtype=float16)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[0,1]=np.nan\n",
    "data[2,3]=np.nan\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "356792dc",
   "metadata": {},
   "source": [
    "### 1、isnan()函数，布尔数组简单提出nan数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "da39562e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[False,  True, False, False, False],\n",
       "       [False, False, False, False, False],\n",
       "       [False, False, False,  True, False]])"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#使用isnan()函数生成布尔数组\n",
    "np.isnan(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "40f7409b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([3., 8., 4., 3., 6., 2., 5., 4., 5., 2., 0., 8., 0.], dtype=float16)"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#将nan数据直接剔除。但会形成一个一维数组，这样就会除掉原来的数组结构。\n",
    "#此时实际上是针对原数组的切片操作，属于视图view,并不改变原数组。\n",
    "data[~np.isnan(data)]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "021c4cfd",
   "metadata": {},
   "source": [
    "### 2、np.delete()+where()函数，保留结构的函数删除nan数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "5e959938",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.  ,  2.7 ,  5.4 ,  1.35, 10.8 ],\n",
       "       [ 1.35,  9.45,  6.75,  9.45,  9.45],\n",
       "       [ 9.45,  1.35,  5.4 ,  5.4 ,  2.7 ]])"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#delete精确删除数组中的行列数。乘以小数后自动变为float64数据类型。\n",
    "data1=np.random.randint(0,9,size=(3,5))*1.35\n",
    "data1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "e6c54b3e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype('float64')"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data1.dtype"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "f424fd06",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.  ,  2.7 ,  5.4 ,  1.35, 10.8 ],\n",
       "       [ 1.35,  9.45,  6.75,   nan,  9.45],\n",
       "       [ 9.45,  1.35,  5.4 ,  5.4 ,   nan]])"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data1[[1,2],[3,4]]=np.nan\n",
    "data1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "d1ec9ccf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([1, 2]), array([3, 4]))"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#用where函数找到nan位置，返回位置元组。元组1表示存在的行，元组2表示各列。\n",
    "lines=np.where(np.isnan(data1))\n",
    "lines"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "53d5a376",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.  ,  2.7 ,  5.4 ,  1.35, 10.8 ]])"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#delete()函数，整行整列删除。删除后不改变原始的数据。\n",
    "np.delete(data1,lines[0],axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9522ea0",
   "metadata": {},
   "source": [
    "### 3、替换为某个数值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "226dbf86",
   "metadata": {},
   "outputs": [],
   "source": [
    "#默认读取数据时是自动转换成np.float格式。但存在空数据时，由于无法转换统一数据，那么会读取报错。\n",
    "#存在空数据时可以首先以字符串的形式读取，再寻求其他的转换方法。\n",
    "data2=np.loadtxt('nan.csv',delimiter=',',skiprows=1,dtype=np.str_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "3b496148",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([['60', '88'],\n",
       "       ['58', '0.0'],\n",
       "       ['0.0', '98'],\n",
       "       ['80', '69'],\n",
       "       ['69', '79'],\n",
       "       ['267', '334']], dtype='<U3')"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data2[data2=='']=0.0\n",
    "data2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "59f71efe",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 60.,  88.],\n",
       "       [ 58.,   0.],\n",
       "       [  0.,  98.],\n",
       "       [ 80.,  69.],\n",
       "       [ 69.,  79.],\n",
       "       [267., 334.]])"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data3=data2.astype(np.float64)\n",
    "data3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc16fa05",
   "metadata": {},
   "source": [
    "### 3、求平均参数及总和的考虑"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "id": "b8fe1df0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([['60', '88'],\n",
       "       ['58', 'nan'],\n",
       "       ['nan', '98'],\n",
       "       ['80', '69'],\n",
       "       ['69', '79'],\n",
       "       ['52', '78']], dtype='<U25')"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 当设置dtype=str_时，默认为2U，字符串为'na'，此时不能转换为float的NaN。\n",
    "# 将dtype='U25'，时能够在字符串和float之间的NAN转换。查阅文档可知字符串的具体类型。\n",
    "data4=np.loadtxt('nan.csv',delimiter=',',skiprows=1,encoding='utf-8',dtype='U25')\n",
    "data4[data4==\"\"]=np.NAN\n",
    "data4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "id": "306b8ac7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[60., 88.],\n",
       "       [58., nan],\n",
       "       [nan, 98.],\n",
       "       [80., 69.],\n",
       "       [69., 79.],\n",
       "       [52., 78.]])"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data5=data4.astype(np.float64)\n",
    "data5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "id": "61877bf2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[60., 88.],\n",
       "       [58.,  0.],\n",
       "       [ 0., 98.],\n",
       "       [80., 69.],\n",
       "       [69., 79.],\n",
       "       [52., 78.]])"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data5[np.isnan(data5)]=0\n",
    "data5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "id": "c2f5297e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([148.,  58.,  98., 149., 148., 130.])"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#参数axis的用法。delete函数axis-0表示行，之外的所有行数axis=1表示行。\n",
    "math_sum=data5.sum(axis=1)\n",
    "math_sum"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "id": "92ebae68",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "63.8\n",
      "82.4\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([[60. , 88. ],\n",
       "       [58. , 82.4],\n",
       "       [63.8, 98. ],\n",
       "       [80. , 69. ],\n",
       "       [69. , 79. ],\n",
       "       [52. , 78. ]])"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 求平均值。此处出于统计的原因，不能直接将nan直接替换为0，否则平均值不准确。\n",
    "# nan替换为除nan之外的平均值即可。在画图时比较有用。\n",
    "data6=data4.astype(np.float64)\n",
    "for x in range(data6.shape[1]):\n",
    "    col = data6[:,x]\n",
    "    #获取除nan值之外的数据的平均值\n",
    "    #除XX之外在numpy中选取函数中表示为～\n",
    "    mean = col[~np.isnan(col)].mean()\n",
    "    print(mean)\n",
    "    col[np.isnan(col)]=mean\n",
    "data6"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "id": "d0df01f3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(6, 2)"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data4.shape"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
